pacman::p_load(sf, tidyverse, tmap, sfdep, lubridate, plotly, dplyr)Take-home Exercise 2
1. Overview
Dengue Fever, or Dengue Hemorrhagic Fever, is a prevalent mosquito-borne illness found in tropical and subtropical regions. It stems from acute dengue virus infection transmitted by female Aedes aegypti and Aedes albopictus mosquitoes. In 2015, Taiwan faced a severe outbreak, recording over 43,000 cases and 228 deaths. Subsequently, annual cases remained below 200. However, in 2023, Taiwan saw a spike with 26,703 reported cases, notably over 25,000 in Tainan City.
2. The Task
Our task is to explore whether the occurrence of dengue fever outbreaks in Tainan City is influenced by spatial factors alone or both spatial and temporal factors. If the outbreaks show dependence on space or both space and time, the goal is to identify clusters, outliers, and emerging hotspots/coldspots within the area.
3. The Data
This study will utilize two datasets, which are:
Geospatial data of village boundary of Taiwan in the format of an ESRI shapefile. The data is in the Taiwan Geographic Coordinate System.
Aspatial data of reported dengue cases in Taiwan since 1998.
4. Installation and Loading of Packages
The R packages that will be used in this study are:
sf: for importing, managing, and processing geospatial data
tidyverse: for performing data science tasks
tmap: for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API
sfdep: supports spatial dependence analysis in spatial econometrics using R.
lubridate: provides functions for working with dates and times
plotly: for interactive and dynamic plotting in R
dplyr: for wrangling data
5. Import Data
5.1 Importing geospatial data
The st_read() function is associated with the sf package in R. This function is used to read spatial data from various formats. When using the st_read function to read a shapefile, there is no need to explicitly specify the file format.
tainan_sf <- st_read(dsn = "../../data/th2/data/geospatial",
layer = "TAINAN_VILLAGE")Reading layer `TAINAN_VILLAGE' from data source
`C:\viddyasri\IS415-GAA\data\th2\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 649 features and 10 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS: TWD97
Now, to understand the shapefile imported, let’s retrieve information related to the data frame.
st_geometry(tainan_sf)Geometry set for 649 features
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS: TWD97
First 5 geometries:
head(tainan_sf, n=10)Simple feature collection with 10 features and 10 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0269 ymin: 22.93251 xmax: 120.2905 ymax: 23.17218
Geodetic CRS: TWD97
VILLCODE COUNTYNAME TOWNNAME VILLNAME VILLENG COUNTYID COUNTYCODE
1 67000280002 臺南市 歸仁區 六甲里 Liujia Vil. D 67000
2 67000350032 臺南市 安南區 青草里 Qingcao Vil. D 67000
3 67000150009 臺南市 七股區 溪南里 Xinan Vil. D 67000
4 67000150010 臺南市 七股區 七股里 Qigu Vil. D 67000
5 67000150008 臺南市 七股區 龍山里 Longshan Vil. D 67000
6 67000150017 臺南市 七股區 中寮里 Zhongliao Vil. D 67000
7 67000150004 臺南市 七股區 篤加里 Dujia Vil. D 67000
8 67000150007 臺南市 七股區 塩埕里 Yancheng Vil. D 67000
9 67000150022 臺南市 七股區 三股里 Sangu Vil. D 67000
10 67000150023 臺南市 七股區 十份里 Shifen Vil. D 67000
TOWNID TOWNCODE NOTE geometry
1 D33 67000280 <NA> POLYGON ((120.2725 22.95868...
2 D06 67000350 <NA> POLYGON ((120.1176 23.08387...
3 D22 67000150 <NA> POLYGON ((120.121 23.1355, ...
4 D22 67000150 <NA> POLYGON ((120.1312 23.1371,...
5 D22 67000150 <NA> POLYGON ((120.0845 23.13503...
6 D22 67000150 <NA> POLYGON ((120.126 23.16917,...
7 D22 67000150 <NA> POLYGON ((120.1585 23.15376...
8 D22 67000150 <NA> POLYGON ((120.0636 23.17069...
9 D22 67000150 <NA> POLYGON ((120.0422 23.11545...
10 D22 67000150 <NA> POLYGON ((120.1201 23.09364...
We will use plot to create basic initial visualizations of the geospatial data and use tmap for a more detailed distribution of Tainan village and its counties.
plot(tainan_sf)
tmap_mode('plot')
tm_shape(tainan_sf) +
tm_polygons("TOWNID")
The number of levels of the variable “TOWNID” is 37, which is larger than the maximum number of categories (which is 30), thus the levels are combined.
5.2 Importing aspatial data
The read_csv() function is associated with the tidyverse package in R. This function is employed to read tabular data in CSV (Comma-Separated Values) format. When using the read_csv function, there is a need to explicitly specify the file format.
dengue_csv <- read_csv("../../data/th2/data/aspatial/Dengue_Daily.csv")For the aspatial data, only three columns are of relevance to this exercise. The three columns are
- 發病日: Onset date
- 最小統計區中心點X: x-coordinate
- 最小統計區中心點Y: y-coordinate
Therefore, we will be extracting only those three columns and renaming them for ease of further evaluation.
dengue_csv <- dengue_csv %>%
select(發病日, 最小統計區中心點X, 最小統計區中心點Y) %>%
rename(ONSET_DATE = "發病日", X_COORDINATE = "最小統計區中心點X", Y_COORDINATE = "最小統計區中心點Y")Now, to understand the confined csv created, let’s retrieve information related to the it.
head(dengue_csv, n=10)# A tibble: 10 × 3
ONSET_DATE X_COORDINATE Y_COORDINATE
<date> <chr> <chr>
1 1998-01-02 120.505898941 22.464206650
2 1998-01-03 120.453657460 22.466338948
3 1998-01-13 121.751433765 24.749214667
4 1998-01-15 120.338158907 22.630316700
5 1998-01-20 121.798235373 24.684507639
6 1998-01-22 None None
7 1998-01-23 121.547480075 24.982467229
8 1998-01-26 121.500936346 25.139302723
9 1998-02-11 120.209079313 22.978859098
10 1998-02-16 120.313207729 22.724216594
From the above, we can see that some rows are missing values. We will look further into this matter in the data preparation section.
6. Data Preparation - Study Area Layer
In this section, we need to prepare a study area layer consisting of sf polygon features at the village level that include specific counties, namely D01, D02, D04, D06, D07, D08, D32, and D39 of Tainan City, Taiwan.
6.1 Handle invalid geometries
First, we check for invalid geometries and handle them accordingly.
length(which(st_is_valid(tainan_sf) == FALSE))[1] 0
As shown above, there are no invalid geometries.
6.2 Check projection system
In this exercise, we aim to utilize Taiwan’s national projected coordinate system, commonly known as TWD97. Here, we verify whether our data frame is aligned with this specific projected coordinate system.
st_crs(tainan_sf)Coordinate Reference System:
User input: TWD97
wkt:
GEOGCRS["TWD97",
DATUM["Taiwan Datum 1997",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["Taiwan, Republic of China - onshore and offshore - Taiwan Island, Penghu (Pescadores) Islands."],
BBOX[17.36,114.32,26.96,123.61]],
ID["EPSG",3824]]
Upon inspecting the dataframe, we can determine that it is utilizing the TWD97 coordinate system with the EPSG code 3824. Given this alignment, there is no requirement for a projection transformation.
6.2 Preparing the study area layer
To confine the study area to specific counties, the following code accomplishes this:
We begin by identifying the town IDs we want to include in the filtering process.
town_ids_to_include <- c("D01", "D02", "D04", "D06", "D07", "D08", "D32", "D39")Subsequently, we create a new data frame by extracting rows from the original data frame that meet the condition of having the town IDs specified for inclusion.
tainan_sf_confined <- tainan_sf %>%
filter(TOWNID %in% town_ids_to_include)Now, we plot to see if we’ve successfully confined the data frame to the specified counties and to visually represent the distribution of these counties on the map.
tmap_mode('plot')
tm_shape(tainan_sf_confined) +
tm_polygons("TOWNID")
As seen in the tmap above, we have successfully confined the data frame to the specified counties.
7. Data Preparation - Dengue Fever Layer
In this section, we need to prepare an sf point layer representing dengue fever cases within the designated study area. The inclusion criteria for these cases entail confining them to epidemiology weeks 31-50 of the year 2023.
7.1 Filtering data
Given that our analysis is specifically focused on the year 2023, let us first filter the dataset to isolate information from that specific year. This is crucial, as the existing CSV file contains data spanning from 1998, which is beyond the scope of our current investigation.
First, we transform the “ONSET_DATE” column into a date object.
dengue_csv$ONSET_DATE <- as.Date(dengue_csv$ONSET_DATE)Subsequently, we proceed to filter and extract the data entries corresponding to the year 2023.
dengue_csv_confined <- dengue_csv[dengue_csv$ONSET_DATE >= as.Date("2023-01-01") & dengue_csv$ONSET_DATE <= as.Date("2023-12-31"), ]Following that, we proceed to extract the week information from the “ONSET_DATE” column, utilizing this information to selectively filter dengue cases specifically for epidemiology weeks 31 through 50.
dengue_csv_confined$EPIDEMIOLOGY_WEEK <- week(dengue_csv_confined$ONSET_DATE)dengue_csv_confined <- dengue_csv_confined %>%
filter(EPIDEMIOLOGY_WEEK >= 31 & EPIDEMIOLOGY_WEEK <= 50)Let’s take a look at we have now:
head(dengue_csv_confined, n=10)# A tibble: 10 × 4
ONSET_DATE X_COORDINATE Y_COORDINATE EPIDEMIOLOGY_WEEK
<date> <chr> <chr> <dbl>
1 2023-07-30 120.220182286 22.976075790 31
2 2023-07-30 120.218036763 22.980068070 31
3 2023-07-30 120.235449178 23.013736726 31
4 2023-07-30 120.245579790 23.019322681 31
5 2023-07-30 120.246524983 23.018206041 31
6 2023-07-30 120.226597184 23.016947543 31
7 2023-07-30 120.459990679 23.599175697 31
8 2023-07-30 120.234949811 23.015998065 31
9 2023-07-30 120.234949811 23.015998065 31
10 2023-07-30 120.247723767 23.011129228 31
7.2 Handle missing values
In our previous inspection of csv data, we noticed that certain rows within the dataset exhibited “None” values in the X_COORDINATE and Y_COORDINATE columns. Now, we aim to quantify the number of these rows with missing values specifically for the year 2023 data.
none_column_X <- dengue_csv_confined[dengue_csv_confined$X_COORDINATE == "None", ]
none_column_Y <- dengue_csv_confined[dengue_csv_confined$Y_COORDINATE == "None", ]Below are the missing rows:
print(none_column_X)# A tibble: 14 × 4
ONSET_DATE X_COORDINATE Y_COORDINATE EPIDEMIOLOGY_WEEK
<date> <chr> <chr> <dbl>
1 2023-08-18 None None 33
2 2023-08-24 None None 34
3 2023-09-15 None None 37
4 2023-09-15 None None 37
5 2023-09-19 None None 38
6 2023-09-23 None None 38
7 2023-09-26 None None 39
8 2023-09-28 None None 39
9 2023-09-30 None None 39
10 2023-10-17 None None 42
11 2023-10-21 None None 42
12 2023-10-26 None None 43
13 2023-10-28 None None 43
14 2023-12-08 None None 49
print(none_column_Y)# A tibble: 14 × 4
ONSET_DATE X_COORDINATE Y_COORDINATE EPIDEMIOLOGY_WEEK
<date> <chr> <chr> <dbl>
1 2023-08-18 None None 33
2 2023-08-24 None None 34
3 2023-09-15 None None 37
4 2023-09-15 None None 37
5 2023-09-19 None None 38
6 2023-09-23 None None 38
7 2023-09-26 None None 39
8 2023-09-28 None None 39
9 2023-09-30 None None 39
10 2023-10-17 None None 42
11 2023-10-21 None None 42
12 2023-10-26 None None 43
13 2023-10-28 None None 43
14 2023-12-08 None None 49
Given the uncertainty about the specific locations of the points with missing latitude or longitude values, it is advisable not to arbitrarily input latitude or longitude, as this may lead to inaccurate representations. Considering that the missing values constitute only 14 out of 25,479 observations, removing these observations could be a reasonable approach to ensure the accuracy of our spatial analysis. The missing values constitute a small proportion of the total observations and the impact on the overall analysis can be considered negligible.
Here, we eliminate rows where the coordinates are marked as “None” and converting coordinates to a numeric type to ensure that they can be used for spatial operations and mapping:
dengue_csv_confined <- dengue_csv_confined %>%
filter(X_COORDINATE != "None" & Y_COORDINATE != "None") %>%
mutate(
X_COORDINATE = as.numeric(X_COORDINATE),
Y_COORDINATE = as.numeric(Y_COORDINATE)
)7.3 Converting data into a sf object
Now, we can proceed with the conversion of the cleaned data into an sf data frame as well as ensuring that it is projected to the right coordinate system:
dengue_sf <- st_as_sf(dengue_csv_confined, coords = c("X_COORDINATE", "Y_COORDINATE"), crs = 4326) %>% st_transform(crs = 3824)st_geometry(dengue_sf)Geometry set for 25461 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 118.3081 ymin: 21.98551 xmax: 121.8983 ymax: 25.20237
Geodetic CRS: TWD97
First 5 geometries:
head(dengue_sf)Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 120.218 ymin: 22.97608 xmax: 120.2465 ymax: 23.01932
Geodetic CRS: TWD97
# A tibble: 6 × 3
ONSET_DATE EPIDEMIOLOGY_WEEK geometry
<date> <dbl> <POINT [°]>
1 2023-07-30 31 (120.2202 22.97608)
2 2023-07-30 31 (120.218 22.98007)
3 2023-07-30 31 (120.2354 23.01374)
4 2023-07-30 31 (120.2456 23.01932)
5 2023-07-30 31 (120.2465 23.01821)
6 2023-07-30 31 (120.2266 23.01695)
We have successfully transformed the csv data into an sf point data frame, using the TWD97 projected coordinate system. Let’s do some initial visualizations of these points using plot() and tmap.
plot(dengue_sf, main = "Dengue Fever Cases (Epi Weeks 31-50, 2023)")
tmap_mode("plot")
tm_shape(dengue_sf) +
tm_dots()
It is important to note that the points depicted in the diagrams above haven’t been constrained to the study area; instead, they encompass the entire Tainan City. Further steps are required to refine the visualization and focus exclusively on the specified study area.
7.4 Confining dengue case layer within the study area
The dengue fever layer now needs to be restricted to the previously defined study area. This can be achieved by employing the st_intersection function. The st_intersection function essentially determines the common spatial elements between two layers, in this case, the dengue fever layer and the predefined study area. As a result of this operation, the output will consist of spatial data that exclusively pertains to the overlapping regions between the two layers.
The code below has been commented out due to its extended execution time, and instead, the resultant dengue fever layer has been loaded into the environment for further use.
#dengue_2023_tainan <- st_intersection(tainan_sf_confined, dengue_sf)
#write_rds(dengue_2023_tainan, "../../data/rds/dengue_2023_tainan.rds")dengue_2023_tainan <- read_rds("../../data/rds/dengue_2023_tainan.rds")Here is a glimpse of the resulting dengue fever layer:
head(dengue_2023_tainan, n=10)Simple feature collection with 10 features and 12 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 120.1982 ymin: 22.97283 xmax: 120.2669 ymax: 23.04125
Geodetic CRS: TWD97
VILLCODE COUNTYNAME TOWNNAME VILLNAME VILLENG COUNTYID
211 67000310027 臺南市 永康區 龍潭里 Longtan Vil. D
175 67000340044 臺南市 北區 長勝里 Zhangsheng Vil. D
145 67000310036 臺南市 永康區 尚頂里 Shangding Vil. D
169 67000310041 臺南市 永康區 龍埔里 Longpu Vil. D
241 67000310031 臺南市 永康區 六合里 Liuhe Vil. D
19 67000310001 臺南市 永康區 五王里 Wuwang Vil. D
63 67000320008 臺南市 東區 東光里 Dongguang Vil. D
32 67000320032 臺南市 東區 崇德里 Chongde Vil. D
182 67000370042 臺南市 中西區 小西門里 Xiaoximen Vil. D
239 67000310026 臺南市 永康區 勝利里 Shengli Vil. D
COUNTYCODE TOWNID TOWNCODE NOTE ONSET_DATE EPIDEMIOLOGY_WEEK
211 67000 D39 67000310 <NA> 2023-07-29 31
175 67000 D04 67000340 <NA> 2023-07-29 31
145 67000 D39 67000310 <NA> 2023-07-29 31
169 67000 D39 67000310 <NA> 2023-07-29 31
241 67000 D39 67000310 <NA> 2023-07-29 31
19 67000 D39 67000310 <NA> 2023-07-29 31
63 67000 D01 67000320 <NA> 2023-07-29 31
32 67000 D01 67000320 <NA> 2023-07-29 31
182 67000 D08 67000370 <NA> 2023-07-29 31
239 67000 D39 67000310 <NA> 2023-07-29 31
geometry
211 POINT (120.2669 23.02954)
175 POINT (120.2196 23.0086)
145 POINT (120.2223 23.02192)
169 POINT (120.2594 23.04125)
241 POINT (120.2317 23.01115)
19 POINT (120.2357 23.01308)
63 POINT (120.2291 22.99008)
32 POINT (120.2265 22.97283)
182 POINT (120.1982 22.98234)
239 POINT (120.2323 23.00397)
7.5 Visualizing the dengue fever layer within the study area
Let’s visualize the dengue cases within the study area using tmap.
tmap_mode("plot")
tm_shape(tainan_sf_confined) +
tm_borders(lwd = 1, col = "#C09984") +
tm_shape(dengue_2023_tainan) +
tm_dots(col = "#2B2B2B", size = 0.05) +
tm_layout(title = "Dengue Cases within Study Area",
title.position = c("center", "top"),
title.size = 0.8,
frame = FALSE)
tmap_mode('plot')
tm_shape(tainan_sf_confined) +
tm_polygons("TOWNID") +
tm_shape(dengue_2023_tainan) +
tm_dots(col = "#2B2B2B", size = 0.05) +
tm_layout(title = "Dengue Cases within Study Area",
title.position = c("center", "top"),
title.size = 0.8,
frame = FALSE,
legend.position = c("left", "bottom"),
legend.title.size = 0.7,
legend.text.size = 0.6)
From the visual analysis of the maps provided, a noticeable clustering pattern is evident in the central region of the study area. On the contrary, points located in the regions away from the center appear to be more dispersed.
8. Choropleth Mapping and Analysis
In this section, we will create and visualize a choropleth map representing the occurrences of dengue fever cases specifically within the confined Tainan study layer.
Before we visualize the map, some data processing needs to be done. First, we aggregate the dengue fever cases by village code, creating new data with the summarized information. The dengue fever cases are referred to as observations in the code. The result is then converted to a standard data frame and the geometry column is removed for a more straightforward join operation later.
grouped_data <- dengue_2023_tainan %>%
group_by(VILLCODE) %>%
summarize(OBSERVATIONS_SUM = n())
grouped_data <- as.data.frame(grouped_data)
grouped_data <- grouped_data %>%
select(-geometry)Then, we merge this data with the confined Tainan study layer based on the village code. The result is a modified spatial data frame (dengue_2023_tainan) with additional information about the number of dengue fever observations for each village.
dengue_2023_tainan <- left_join(tainan_sf_confined, grouped_data, by = "VILLCODE")
#removed as column is irrelevant
dengue_2023_tainan <- dengue_2023_tainan %>%
select(-NOTE)Let’s take a look at the result:
glimpse(dengue_2023_tainan)Rows: 258
Columns: 11
$ VILLCODE <chr> "67000350032", "67000270011", "67000370005", "6700033…
$ COUNTYNAME <chr> "臺南市", "臺南市", "臺南市", "臺南市", "臺南市", "臺…
$ TOWNNAME <chr> "安南區", "仁德區", "中西區", "南區", "安南區", "安南…
$ VILLNAME <chr> "青草里", "保安里", "赤嵌里", "大成里", "城北里", "城…
$ VILLENG <chr> "Qingcao Vil.", "Bao'an Vil.", "Chihkan Vil.", "Dache…
$ COUNTYID <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D"…
$ COUNTYCODE <chr> "67000", "67000", "67000", "67000", "67000", "67000",…
$ TOWNID <chr> "D06", "D32", "D08", "D02", "D06", "D06", "D08", "D06…
$ TOWNCODE <chr> "67000350", "67000270", "67000370", "67000330", "6700…
$ OBSERVATIONS_SUM <int> 2, 19, 111, 29, 1, 10, 38, 44, 112, 65, 28, 2, 3, 11,…
$ geometry <POLYGON [°]> POLYGON ((120.1176 23.08387..., POLYGON ((120…
Now, we are ready to create our choropleth map! This map visualizes the distribution of dengue cases per village for epidemiology weeks 31 to 50 for the year 2023.
tmap_mode("plot")
tm_shape(dengue_2023_tainan) +
tm_fill("OBSERVATIONS_SUM",
style = "quantile",
palette = "Purples",
title = "Observations") +
tm_layout(main.title = "Distribution of Dengue Cases per Village",
main.title.position = "center",
main.title.size = 0.8,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)
Similar to the previous observation in the tmap, we can discern a pattern where the outer regions of the confined map exhibit a lower number of recorded cases. As we move towards the central regions of the confined map, there is a noticeable escalation in the number of cases, with varying intensities, denoted by the darker gradients surrounding the center. In the outer regions, villages with low dengue fever case records are predominantly surrounded by other villages with similarly low cases. As we move towards the center, we observe instances where villages with high case numbers are surrounded by villages exhibiting both high and low case counts. To gain deeper insights, let’s conduct additional analyses.
9. Global Spatial Autocorrelation
Global spatial autocorrelation is a statistical measure that quantifies the degree to which similar values of a variable are clustered or dispersed across spatial units (e.g., regions, points, polygons). In simpler terms, it helps determine if neighboring locations tend to have similar or dissimilar values. The most common metric for global spatial autocorrelation is the Moran’s I statistic.
9.1 Deriving contiguity with Queen’s method
Before proceeding with global spatial autocorrelation analysis, it is crucial to establish spatial contiguity, or in other words, neighborhood relationships among the spatial units in your study area. In this exercise, we will utilize the Queen’s contiguity method. The Queen’s contiguity method defines spatial contiguity by considering neighboring units that share a common border or vertex.
To ensure the validity of the analysis, the first step involves handling missing values. In this case, any missing data (NA) in the data frame will be dropped.
dengue_2023_tainan <- tidyr::drop_na(dengue_2023_tainan)Then, we derive contiguity:
wm_q <- dengue_2023_tainan %>%
mutate(nb = st_contiguity(geometry),
wt = st_weights(nb,
style = "W"),
.before = 1) wm_qSimple feature collection with 257 features and 12 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS: TWD97
First 10 features:
nb
1 6, 118, 160
2 126, 128, 138, 167, 221
3 68, 69, 171, 180, 183, 184, 187, 199
4 94, 97, 100, 104, 181, 206
5 12, 13, 248, 254
6 1, 12, 13, 118, 160, 248
7 54, 98, 99, 200
8 9, 73, 75, 115, 125, 144, 156, 157, 165, 185
9 8, 110, 115, 125, 165
10 11, 159, 161, 165, 235, 257
wt VILLCODE
1 0.3333333, 0.3333333, 0.3333333 67000350032
2 0.2, 0.2, 0.2, 0.2, 0.2 67000270011
3 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125 67000370005
4 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667 67000330004
5 0.25, 0.25, 0.25, 0.25 67000350028
6 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667 67000350030
7 0.25, 0.25, 0.25, 0.25 67000370009
8 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 67000350017
9 0.2, 0.2, 0.2, 0.2, 0.2 67000350049
10 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667 67000350018
COUNTYNAME TOWNNAME VILLNAME VILLENG COUNTYID COUNTYCODE TOWNID
1 臺南市 安南區 青草里 Qingcao Vil. D 67000 D06
2 臺南市 仁德區 保安里 Bao'an Vil. D 67000 D32
3 臺南市 中西區 赤嵌里 Chihkan Vil. D 67000 D08
4 臺南市 南區 大成里 Dacheng Vil. D 67000 D02
5 臺南市 安南區 城北里 Chengbei Vil. D 67000 D06
6 臺南市 安南區 城南里 Chengnan Vil. D 67000 D06
7 臺南市 中西區 法華里 Fahua Vil. D 67000 D08
8 臺南市 安南區 海南里 Hainan Vil. D 67000 D06
9 臺南市 安南區 國安里 Guo'an Vil. D 67000 D06
10 臺南市 安南區 溪心里 Xixin Vil. D 67000 D06
TOWNCODE OBSERVATIONS_SUM geometry
1 67000350 2 POLYGON ((120.1176 23.08387...
2 67000270 19 POLYGON ((120.2304 22.93544...
3 67000370 111 POLYGON ((120.2012 22.99966...
4 67000330 29 POLYGON ((120.1985 22.98147...
5 67000350 1 POLYGON ((120.1292 23.06512...
6 67000350 10 POLYGON ((120.1246 23.06904...
7 67000370 38 POLYGON ((120.2094 22.98452...
8 67000350 44 POLYGON ((120.175 23.02218,...
9 67000350 112 POLYGON ((120.1866 23.02766...
10 67000350 65 POLYGON ((120.1834 23.06086...
9.2 Performing global Moran’s I test
In the following code snippet, the global_moran() function is used to calculate the Moran’s I value. In contrast to the spdep package, the result, using the sfdep package, is presented as a tibble data.frame.
The hypotheses are as follows:
H0: There is no spatial autocorrelation in the data, and any observed spatial pattern is a result of random chance.
H1: There is a significant positive spatial autocorrelation in the data, indicating that similar values are more likely to be close to each other than would be expected by random chance.
Significance level: 0.05
global_moran_test(wm_q$OBSERVATIONS_SUM,
wm_q$nb,
wm_q$wt)
Moran I test under randomisation
data: x
weights: listw
Moran I statistic standard deviate = 12.703, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.464345173 -0.003906250 0.001358755
In the output, the p-value is reported as < 2.2e-16, which is essentially zero. Given the extremely low p-value which is lower than the alpha value of 0.05, we reject the null hypothesis which means that there is a significant positive spatial autocorrelation in the data. In addition, the positive Moran’s I statistic value of 0.464 indicate spatial clustering, suggesting that similar values are located near each other. In other words, there is a tendency for villages with similar levels of dengue cases to be located near each other on the map.
9.3 Performing global Moran’I permutation test
To enhance the reliability of our results, we conduct the global Moran’s I permutation test. Prior to running the simulation, we use set.seed() to ensure the reproducibility of the computations.
set.seed(1234)global_moran_perm(wm_q$OBSERVATIONS_SUM,
wm_q$nb,
wm_q$wt,
nsim = 99)
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 100
statistic = 0.46435, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided
In a Monte Carlo simulation of Moran’s I with 100 simulations, the observed Moran’s I statistic is 0.46435, indicating clustering of data points. The p-value is reported as < 2.2e-16 which is smaller than the alpha value of 0.05, indicating statistical significance. The alternative hypothesis is two-sided, suggesting that the observed spatial pattern is significantly different from what would be expected under spatial randomness. Therefore, we reject the null hypothesis of no spatial autocorrelation, confirming the presence of a significant spatial pattern in the data.
10. Local Spatial Autocorrelation
Local Spatial Autocorrelation, often measured using indices like Local Moran’s I or Getis-Ord Gi*, assesses the spatial clustering patterns at a local level within a study area. Unlike global spatial autocorrelation, which provides an overall measure of spatial pattern across the entire area, local spatial autocorrelation focuses on identifying clusters, outliers, and areas of localized similarity or dissimilarity.
Local Moran’s I
Local Moran’s I calculates the correlation between the value of a variable at a specific location and the average value of that variable in its neighboring locations.
10.1 Computing local Moran’s I
lisa <- wm_q %>%
mutate(local_moran = local_moran(
OBSERVATIONS_SUM, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)Let’s take a look at the content of the result.
lisaSimple feature collection with 257 features and 24 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS: TWD97
# A tibble: 257 × 25
ii eii var_ii z_ii p_ii p_ii_sim p_folded_sim skewness
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.40 0.0271 0.511 1.92 0.0546 0.02 0.01 -0.473
2 0.756 0.0142 0.151 1.91 0.0563 0.06 0.03 -0.764
3 0.0324 -0.0306 0.0558 0.267 0.790 0.72 0.36 0.797
4 -0.215 -0.0200 0.115 -0.576 0.565 0.56 0.28 -0.447
5 1.43 -0.0628 0.386 2.40 0.0163 0.02 0.01 -0.899
6 1.29 0.0420 0.185 2.91 0.00365 0.02 0.01 -0.325
7 0.326 0.0249 0.0917 0.995 0.320 0.34 0.17 -1.14
8 0.0212 -0.0240 0.0253 0.284 0.776 0.88 0.44 -0.365
9 0.402 0.0515 0.108 1.07 0.286 0.36 0.18 0.140
10 0.0871 -0.00205 0.00315 1.59 0.112 0.08 0.04 -0.435
# ℹ 247 more rows
# ℹ 17 more variables: kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>,
# nb <nb>, wt <list>, VILLCODE <chr>, COUNTYNAME <chr>, TOWNNAME <chr>,
# VILLNAME <chr>, VILLENG <chr>, COUNTYID <chr>, COUNTYCODE <chr>,
# TOWNID <chr>, TOWNCODE <chr>, OBSERVATIONS_SUM <int>,
# geometry <POLYGON [°]>
The output of local_moran() is a sf data.frame containing the columns ii, eii, var_ii, z_ii, p_ii, p_ii_sim, and p_folded_sim.
ii: Observed Local Moran’s I statistic.
eii: Expected value under spatial randomness.
var_ii: Variance of Local Moran’s I.
z_ii: Standard deviate of observed value.
p_ii: Two-tailed p-value for observed value.
p_ii_sim: Simulated two-tailed p-value.
p_folded_sim: Folded simulated p-value, useful for asymmetrical distributions.
10.2 Visualising local Moran’s I and p-value of local Moran’s I
Here is a choropleth map depicting the observed local Moran’s I values.
tmap_mode("plot")
tm_shape(lisa) +
tm_fill("ii", palette = "RdBu") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I of Frequency of Dengue Cases",
main.title.size = 0.8)
The choropleth shows there is evidence for both positive and negative ii values. However, it is useful to consider the p-values for each of these ii values. Here is a choropleth map depicting the p-value of local Moran’s I.
tmap_mode("plot")
tm_shape(lisa) +
tm_fill("p_ii_sim", palette = "RdBu") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I",
main.title.size = 0.8)
For p-values, the appropriate classification should be 0.001, 0.01, 0.05 and not significant instead of the default classification. This will be corrected in the comparison below.
For effective comparison, we plot the maps next to one another. The tmap with distribution based on town ID is also included for analysis.
tmap_mode("plot")
map1 <- tm_shape(lisa) +
tm_fill("ii", palette = "RdBu") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Local Moran's I of Frequency of Dengue Cases",
main.title.size = 0.8)
map2 <- tm_shape(lisa) +
tm_fill("p_ii", palette = "RdBu",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Local Moran's I",
main.title.size = 0.8)
tmap_arrange(map1, map2, ncol = 2)
tmap_mode('plot')
tm_shape(tainan_sf_confined) +
tm_polygons("TOWNID")
Upon comparing the two maps, it becomes evident that numerous villages exhibit positive Local Moran’s I values, indicative of clustering. However, the accompanying non-significant p-values of these same villages imply that this observed clustering may be purely coincidental, lacking statistical support for a systematic spatial pattern.
Furthermore, it can be noted that places exhibiting higher Local Moran’s I values also coincide with those having significant p-values. The combination of elevated Local Moran’s I values and significant p-values further strengthens the indication that these areas possess distinct spatial characteristics contributing to non-random clustering. These villages are located in towns D01, D02, D04, D06, D32 and D39.
10.3 Visualizing LISA map
The LISA map is a categorical representation illustrating outliers and clusters within a geographic area. Outliers come in two types: High-Low (HL) and Low-High (LH), each indicating areas with values significantly differing from their neighbors. Similarly, clusters are categorized as High-High (HH) and Low-Low (LL), representing spatial concentrations of similar values.
lisa_sig <- lisa %>%
filter(p_ii < 0.05)
tmap_mode("view")
tm_shape(lisa) +
tm_polygons(id="VILLENG") +
tm_borders(alpha = 0.5) +
tm_shape(lisa_sig) +
tm_fill("mean", id="VILLENG") +
tm_borders(alpha = 0.4)tmap_mode('plot')
tm_shape(tainan_sf_confined) +
tm_polygons("TOWNID")
The LISA map highlights distinct spatial clusters across the study area. In the outer regions, towns such as D06, D32, and D39 stand out with prevalent Low-Low clusters, indicating areas consistently characterized by lower dengue cases surrounded by similar low cases. This spatial pattern aligns seamlessly with our earlier observations from the choropleth map analysis of dengue cases per village in Section 8. Moving towards the central regions, a notable prevalence of High-High clusters becomes apparent. These clusters, observed in towns D01, D02, and the inner areas of towns D06 and D39, signify localized areas consistently exhibiting higher dengue cases surrounded by neighboring areas with high cases. Only a few villages fall into the low-high category, indicating limited instances of outliers with higher cases surrounded by lower cases. By clicking on individual villages, we are able to reveal their respective names and know whether they are clusters/outliers.
Hot Spot and Cold Spot Area Analysis
Hot Spot and Cold Spot Area Analysis employs spatial weights to pinpoint statistically significant hot spots and cold spots in a spatially weighted attribute. The method identifies locations with values that exhibit spatial clustering or dispersion in proximity to each other, determined by a calculated distance.
10.4 Derive spatial weight matrix
Local Gi* are distance-based spatial statistics. Hence, distance methods instead of contiguity methods should be used to derive the spatial weight matrix.
wm_idw <- dengue_2023_tainan %>%
mutate(nb = st_contiguity(geometry),
wts = st_inverse_distance(nb, geometry,
scale = 1,
alpha = 1),
.before = 1)10.5 Computing local Gi* statistics
Using the new spatial weight matrix, we compute the local Gi* statistics.
HCSA <- wm_idw %>%
mutate(local_Gi = local_gstar_perm(
OBSERVATIONS_SUM, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_Gi)
HCSASimple feature collection with 257 features and 20 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS: TWD97
# A tibble: 257 × 21
gi_star e_gi var_gi p_value p_sim p_folded_sim skewness kurtosis nb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <nb>
1 -2.35 0.00295 1.96e-6 -1.93 0.0539 0.02 0.01 0.419 <int>
2 -2.06 0.00334 1.39e-6 -1.73 0.0829 0.04 0.02 1.08 <int>
3 0.354 0.00424 9.50e-7 0.0182 0.986 0.98 0.49 0.320 <int>
4 0.365 0.00362 1.38e-6 0.591 0.555 0.58 0.29 0.489 <int>
5 -2.65 0.00292 1.00e-6 -2.68 0.00744 0.02 0.01 0.877 <int>
6 -3.16 0.00351 1.18e-6 -3.03 0.00244 0.02 0.01 -0.0124 <int>
7 -1.25 0.00341 1.61e-6 -0.983 0.325 0.32 0.16 0.655 <int>
8 -0.284 0.00386 8.09e-7 -0.251 0.802 0.88 0.44 0.433 <int>
9 1.53 0.00437 1.43e-6 1.20 0.229 0.32 0.16 0.503 <int>
10 -1.47 0.00398 1.31e-6 -1.57 0.117 0.08 0.04 0.391 <int>
# ℹ 247 more rows
# ℹ 12 more variables: wts <list>, VILLCODE <chr>, COUNTYNAME <chr>,
# TOWNNAME <chr>, VILLNAME <chr>, VILLENG <chr>, COUNTYID <chr>,
# COUNTYCODE <chr>, TOWNID <chr>, TOWNCODE <chr>, OBSERVATIONS_SUM <int>,
# geometry <POLYGON [°]>
10.6 Visualizing Gi* and p-value of HCSA
Here is a choropleth map depicting the Gi* values.
tmap_mode("plot")
tm_shape(HCSA) +
tm_fill("gi_star", palette = "RdBu") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Gi* of Frequency of Dengue Cases",
main.title.size = 0.8,
main.title.position = "center")
Below is the choropleth map depicting the p-value of Gi* in these villages.
tmap_mode("plot")
tm_shape(HCSA) +
tm_fill("p_sim", palette = "RdBu") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "p-value of Gi*",
main.title.size = 0.8,
main.title.position = "center")
For effective comparison, we plot the maps next to one another.
tmap_mode("plot")
map1 <- tm_shape(HCSA) +
tm_fill("gi_star", palette = "RdBu") +
tm_borders(alpha = 0.5) +
tm_view(set.zoom.limits = c(6,8)) +
tm_layout(main.title = "Gi* of Frequency of Dengue Cases",
main.title.size = 0.8,
main.title.position = "center")
map2 <- tm_shape(HCSA) +
tm_fill("p_sim", palette = "RdBu",
breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("0.001", "0.01", "0.05", "Not sig")) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "p-value of Gi*",
main.title.size = 0.8,
main.title.position = "center")
tmap_arrange(map1, map2, ncol = 2)
10.7 Visualising hot spot and cold spot areas
Now, we can visualize the statistically significant hot spot and cold spot areas by utilizing the relevant tmap functions.
HCSA_sig <- HCSA %>%
filter(p_sim < 0.05)
tmap_mode("view")
tm_shape(HCSA) +
tm_polygons(id="VILLENG") +
tm_borders(alpha = 0.5) +
tm_shape(HCSA_sig) +
tm_fill("gi_star", id="VILLENG") +
tm_borders(alpha = 0.4)Based on the confined map, regions toward the edges of the confined map display consistently low Gi* values, indicative of statistically significant cold spots for dengue fever cases. This spatial pattern suggests a lower risk or prevalence of the disease in the peripheral areas of Tainan. Conversely, positive Gi* values, representing hot spots, are concentrated in the central vicinity of the map. This clustering of hot spots implies areas with an elevated risk of dengue, warranting heightened public health interventions and monitoring efforts to effectively control the potential spread of the disease. By clicking on individual villages, we are able to reveal their respective names and know their Gi* values.
11. Emerging Hot Spot Analysis
Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method designed to find and describe the dynamic evolution of hot spot and cold spot areas over time.
11.1 Creating a time series cube
First, we make a time series cube.
dengue_2023_tainan_2 <- read_rds("../../data/rds/dengue_2023_tainan.rds")#removed because it is irrelevant
dengue_2023_tainan_2 <- dengue_2023_tainan_2 %>%
select(-NOTE)For this, we will need all possible combinations of unique village codes and epidemiology weeks, allowing you to analyze and explore spatio-temporal patterns across different villages and weeks.
village_week_combinations <- expand.grid(VILLCODE = unique(tainan_sf_confined$VILLCODE), EPIDEMIOLOGY_WEEK = seq(31, 50))Next we prepare a dataset containing information about the number of dengue fever observations for each combination of village code and epidemiology week. The code also ensures those with zero observations are accounted for in the analysis.
grouped_data_2 <- dengue_2023_tainan_2 %>%
group_by(VILLCODE, EPIDEMIOLOGY_WEEK) %>%
summarize(OBSERVATIONS = n())
complete_data <- left_join(village_week_combinations
, grouped_data_2, by = c("VILLCODE", "EPIDEMIOLOGY_WEEK")) %>%
replace(is.na(.), 0) # Replace NA values with 0
complete_data <- complete_data %>%
select(-geometry)complete_data <- as.tibble(complete_data)Here we create the spacetime cube.
OBSERVATIONS_st <- spacetime(complete_data, tainan_sf_confined,
.loc_col = "VILLCODE",
.time_col = "EPIDEMIOLOGY_WEEK")Then, we verify if the resulting object is recognized as a spacetime cube:
is_spacetime_cube(OBSERVATIONS_st)[1] TRUE
11.2 Deriving spatial weights
Next, we derive a spatial weight matrix with inverse distance weights.
OBSERVATIONS_nb <- OBSERVATIONS_st %>%
activate("geometry") %>%
mutate(nb = include_self(st_contiguity(geometry)),
wt = st_inverse_distance(nb, geometry,
scale = 1,
alpha = 1),
.before = 1) %>%
set_nbs("nb") %>%
set_wts("wt")Here’s a look at the resulting object.
head(OBSERVATIONS_nb)# A tibble: 6 × 5
VILLCODE EPIDEMIOLOGY_WEEK OBSERVATIONS nb wt
<chr> <int> <dbl> <list> <list>
1 67000350032 31 0 <int [4]> <dbl [4]>
2 67000270011 31 1 <int [6]> <dbl [6]>
3 67000370005 31 0 <int [9]> <dbl [9]>
4 67000330004 31 0 <int [7]> <dbl [7]>
5 67000350028 31 0 <int [5]> <dbl [5]>
6 67000350030 31 0 <int [8]> <dbl [8]>
11.3 Computing Gi*
Now, we can compute the local Gi* for each village, grouped by epidemiology week.
gi_stars <- OBSERVATIONS_nb %>%
group_by(EPIDEMIOLOGY_WEEK) %>%
mutate(gi_star = local_gstar_perm(
OBSERVATIONS, nb, wt)) %>%
tidyr::unnest(gi_star)gi_stars# A tibble: 5,160 × 13
# Groups: EPIDEMIOLOGY_WEEK [20]
VILLCODE EPIDEMIOLOGY_WEEK OBSERVATIONS nb wt gi_star e_gi var_gi
<chr> <int> <dbl> <lis> <lis> <dbl> <dbl> <dbl>
1 670003500… 31 0 <int> <dbl> -0.771 0.00227 1.22e-5
2 670002700… 31 1 <int> <dbl> 0.366 0.00385 1.04e-5
3 670003700… 31 0 <int> <dbl> -0.382 0.00295 8.23e-6
4 670003300… 31 0 <int> <dbl> -0.180 0.00324 1.41e-5
5 670003500… 31 0 <int> <dbl> -0.846 0.00261 1.83e-5
6 670003500… 31 0 <int> <dbl> -1.04 0.00284 7.71e-6
7 670003700… 31 0 <int> <dbl> -0.846 0.00260 1.50e-5
8 670003500… 31 0 <int> <dbl> -0.662 0.00318 6.44e-6
9 670003500… 31 0 <int> <dbl> 0.0812 0.00257 1.12e-5
10 670003500… 31 0 <int> <dbl> -0.981 0.00289 1.46e-5
# ℹ 5,150 more rows
# ℹ 5 more variables: p_value <dbl>, p_sim <dbl>, p_folded_sim <dbl>,
# skewness <dbl>, kurtosis <dbl>
12. Mann-Kendall Test
The Mann-Kendall test is a non-parametric statistical test used to assess the presence of a trend in a time series or spatial data. We will perform the Mann-Kendall test to assess the trend in dengue cases for each village and determine whether these trends are statistically significant. We will be conducting this test on the top 3 villages with the most number of dengue cases.
The hypotheses are as follows:
H0: There is no trend in the data over time.
H1: There is the presence of a trend, either increasing or decreasing.
Significance value: 0.05
The code below assesses the trend for Fuhua Village, Anfu Village and Xiqi Village.
fuhua <- gi_stars %>%
ungroup() %>%
filter(VILLCODE == "67000310037") |>
select(VILLCODE, EPIDEMIOLOGY_WEEK, gi_star)anfu <- gi_stars %>%
ungroup() %>%
filter(VILLCODE == "67000350050") |>
select(VILLCODE, EPIDEMIOLOGY_WEEK, gi_star)xiqi <- gi_stars %>%
ungroup() %>%
filter(VILLCODE == "67000350040") |>
select(VILLCODE, EPIDEMIOLOGY_WEEK, gi_star)Next, we plot the result using ggplot2 functions.
f <- ggplot(data = fuhua, aes(x = EPIDEMIOLOGY_WEEK, y = gi_star)) +
geom_line(color = "black", size = 1, alpha = 0.8) +
geom_point(color = "blue", size = 1.5) + # Add points for emphasis
labs(title = "Trend of Dengue Cases in Fuhua Village",
x = "Epidemiology Week",
y = "Local Gi* Value") +
theme_minimal() +
theme(plot.title = element_text(size = 8),
axis.title = element_text(size = 8),
axis.text = element_text(size = 8),
plot.caption = element_text(size = 8, color = "gray"))
ggplotly(f)a <- ggplot(data = anfu, aes(x = EPIDEMIOLOGY_WEEK, y = gi_star)) +
geom_line(color = "black", size = 1, alpha = 0.8) +
geom_point(color = "blue", size = 1.5) + # Add points for emphasis
labs(title = "Trend of Dengue Cases in Anfu Village",
x = "Epidemiology Week",
y = "Local Gi* Value") +
theme_minimal() +
theme(plot.title = element_text(size = 8),
axis.title = element_text(size = 8),
axis.text = element_text(size = 8),
plot.caption = element_text(size = 8, color = "gray"))
ggplotly(a)x <- ggplot(data = xiqi, aes(x = EPIDEMIOLOGY_WEEK, y = gi_star)) +
geom_line(color = "black", size = 1, alpha = 0.8) +
geom_point(color = "blue", size = 1.5) + # Add points for emphasis
labs(title = "Trend of Dengue Cases in Xiqi Village",
x = "Epidemiology Week",
y = "Local Gi* Value") +
theme_minimal() +
theme(plot.title = element_text(size = 8),
axis.title = element_text(size = 8),
axis.text = element_text(size = 8),
plot.caption = element_text(size = 8, color = "gray"))
ggplotly(x)Now, we take a look at sl value.
fuhua %>%
summarise(mk = list(
unclass(
Kendall::MannKendall(gi_star)))) %>%
tidyr::unnest_wider(mk)# A tibble: 1 × 5
tau sl S D varS
<dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.232 0.163 -44 190. 950
The sl value of 0.1629844 is positive and implies a upward trend in the gi_star variable for the given time period. However, this trend is not statistically significant at the chosen significance level of 0.05.
anfu %>%
summarise(mk = list(
unclass(
Kendall::MannKendall(gi_star)))) %>%
tidyr::unnest_wider(mk)# A tibble: 1 × 5
tau sl S D varS
<dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.589 0.000317 -112 190. 950
The sl value of 0.0003166109 is positive and indicates a very slight upward trend in the gi_star variable for the given time period. This trend is statistically significant at the chosen significance level of 0.05.
xiqi %>%
summarise(mk = list(
unclass(
Kendall::MannKendall(gi_star)))) %>%
tidyr::unnest_wider(mk)# A tibble: 1 × 5
tau sl S D varS
<dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.642 0.0000865 -122 190. 950
The sl value of 0.00008645681 is positive and indicates a very slight upward trend in the gi_star variable for the given time period. This trend is statistically significant at the chosen significance level of 0.05.
We can replicate this process for each village by using the group_by() function of dplyr package.
all_vill <- gi_stars %>%
group_by(VILLCODE) %>%
summarise(mk = list(
unclass(
Kendall::MannKendall(gi_star)))) %>%
tidyr::unnest_wider(mk)mk_all <- all_vill %>%
arrange(sl, abs(tau)) %>%
slice(1:5)mk_all# A tibble: 5 × 6
VILLCODE tau sl S D varS
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 67000310001 -0.716 0.0000119 -136 190. 950
2 67000320040 -0.705 0.0000160 -134 190. 950
3 67000310025 -0.684 0.0000285 -130 190. 950
4 67000320004 -0.653 0.0000659 -124 190. 950
5 67000350040 -0.642 0.0000865 -122 190. 950
13. Emerging Hotspot Analysis
13.1 Performing emerging hotspot analysis
Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method used to identify and describe areas where significant clustering or hot spots are changing over time.
ehsa <- emerging_hotspot_analysis(
x = OBSERVATIONS_st,
.var = "OBSERVATIONS",
k = 1,
nsim = 99
)13.2 Visualising the distribution of EHSA classes
Next, we use ggplot2 functions to reveal the distribution of EHSA classes in the form of a bar chart.
ggplot(data = ehsa,
aes(x = classification)) +
geom_bar()
Based on the bar chart, the highest bar is oscillating hot spots followed by oscillating cold spots as second highest.
13.3 Visualising EHSA
tainan_ehsa <- tainan_sf_confined %>%
left_join(ehsa,
by = join_by(VILLCODE == location))ehsa_sig <- tainan_ehsa %>%
filter(p_value < 0.05)
tmap_mode("view")
tm_shape(tainan_ehsa) +
tm_polygons(id="VILLENG") +
tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
tm_fill("classification", id="VILLENG") +
tm_borders(alpha = 0.4)The majority of the identified patterns exhibit oscillating hotspots. These statistically significant hotspots, predominantly situated in the northern part of the confined map (towns D06 and D39), were observed during the final weeks of the dengue outbreak. Notably, these areas have a historical context of being statistically significant cold spots during prior weeks. Conversely, oscillating cold spots are more prominent in the southern regions of the confined map (towns D02 and D32) and exhibit the opposite characteristics. These areas were statistically significant hot spots in the earlier weeks and transformed into cold spots in the later weeks of the dengue outbreak. By clicking on individual villages, we are able to reveal their respective names and know the classifiction of specific villages.
14. References
Learning from senior: AILYS TEE XYNYN and JENNIFER POERNOMO.
In-class exercise 5
ChatGPT